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Abstract: Wc consider Markov chain X n with spectral gap in space. 
Assume that / is a bounded function on X with real values. Then the prob- 
abilities of large deviations of sums Sn — 

Efc=i f(X k ) satisfy Hoeffding's- 
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spectral gap and the end-points of support of /. We generalize the results 
of [LP04] in two directions. In our paper the state space is general and we 
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1. Introduction 

Consider Markov chain (X ra ) n >n, with values in Polish space X with Borel a- 
field B(X) and stationary distribution ir, and a function / : X — > [0, 1]. Denote 
by /i = 7r/ the stationary mean value of /. Let S n be the partial sum of f(X n ), 
i.e. S n = 52&=i f(Xn)' The main goal of this paper is to derive bounds of 
probabilities of large deviations for S n . We prove theorems analogous to [LP04] 
in a more general setting: the state space is general and we do not assume 
reversibility. The following bound is a consequence of our main result: 

1.1 Theorem. // chain X n is ip -irreducible and if exists such A that for every 
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function g with irg = the norm \\Pg\\^ < X a following inequality is satisfied 



Inequalities of this form can play an important role in Monte Carlo Markov 
chains (MCMC) algorithms because they bound the error of estimation. Results 
of this type have been obtained for uniformly ergodic chains in [GO02] and 
improved in [KLMM05]. In the case when the state space is discrete related 
results are obtained by [Lez98] and [LP04]. 

We use a similar technique as in [LP04] . The first step is to construct an associ- 
ated chain X' n and reduce the problem to properties of operator corresponding 
to cxp(tS'n). In the second step the problem is reduced to the two-state 
space case. 

The paper is organized as follows. In Section 2 we introduce our notations. 
The main results are established in Section 3. Finally, in Section 4 we discusses 
possibilities to check the existence of the spectral gap in non reversible case. 
Proofs of technical lemmas are postponed in Section 5. 



2. Definitions and notation 

Throughout this paper (X n ) n > represents i/'-irreducible Markov chain on a 
Polish space X with cx-field B(X), transition kernel P(x, A) and stationary dis- 
tribution tt. Recall that Markov chain is ?/;-irreducible if there exists non-trivial 
measure ip such that for all A £ B(X) with ip(A) > and for all x € X we have 
^x(ta < oo) > 0, where ta is first return to set A, i.e. ta = inf{n > 1 : X n G A}. 

The linear operator P associated with transition kernel P(x, A) acts to the right 
on functions and to the left on measures, so that 



For every measure v on B(X) and every function g : X — > K we denote: 



Consider P as an operator on Hilbert space L\ , the space of functions such that 
7r(/ 2 ) < oo, with inner product (/, g) = J x f(x)g(x)ir(dx). The norm in L\ is 
denoted by ||-|L- As usual, the norm of operator T on L\ is defined by 








sup 

{/ : ll/IL = l} 



\Tf\\ 
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3. Main result 

We assume that transition operator satisfies: 
3.1 Assumption. 



lp - n 



A < 1, 



where II = 1 



3.2 REMARK. Note that this assumption for reversible chains is equivalent to 
existing spectral gap and hence is equivalent to geometric ergodicity [KM09, 
RR97]. In non reversible case exist geometrically crgodic chains, such that 
Assumption 3.1 doesn't hold even for any of the n-step transition operators 
[KM09]. We will discuss this issue in more detail in Section 4. 

Let / be a function from X to [0, 1] and let S n be a sum S n — J2k=i f(-^k)- 

3.3 Theorem. Let X n be ip -irreducible Markov chain with stationary distribu- 
tion 7r. Moreover let 1 — A be a spectral gap of transition kernel P. Then the 
following bounds hold for all e > such that fx + e < 1 : 



P ff (S n > n(M + 0) < 



/! + /iA 



-i n(/i+e) 



1 - 2-^ 

< exp <! -2 



1 - A 



fi + /iA 



1 - 2 



l + v^ J 



i(Ai-e) 



whe 



1 + A 

AX(fi + e)(/2 - e) 
W(l " A) 2 



2 

e n 



To prove Theorem 3.3 we need to consider a new Markov chain (X 7 ) n >i defined 
by the transition kernel Q such that for all A £ B(X), 

Q(x, A) = (1 - A)tt(A) + Al(a- G A), 

where A appear in Assumption 3.1. 

For any bounded linear operator T on L\ and any t € R define operator T t (g) = 
3.4 Lemma. //S 1 ,! is defined as above and P satisfies 3.1, then 



E 7r exp(tS'„) < 



£ 2 M 



As in [LP04] we define a two-state chain. Let (Y n ) n >i be a Markov chain on 
space {0,1} with transition matrix with second largest eigenvalue A > and 
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stationary distribution \x 
written as 



[1 — n, fj]' := [fi, fi]' . Then transition matrix can be 



M, 



/' . A 



AI+(l-A)l/ti'. 



diag(l,e') and denote by 8 t the Perron- Frobcnius eigenvalue of 



Let D 2 t 

D^M^^x. By the same arguments as in Theorem 2 from [LP04] we obtain: 



3.5 Theorem. Let X' n have transition kernel 
for every convex function G : R — » R me Ziawe 



E w [G (f(X[) + ... + /(X;))] < E M [G (Fi + . 
where Y n is a Markov chain with transition matrix M^ x 



and let fi = 1 & 7T f{X' k ). Then 
■ + Y n ))], 



We define function g(x) 



1 



(1 - \)e tf{ ^ 



x n 



If r t is solution of equation 
7r(dx) 



\ e tf(.x) 



(3.6) 



and if r t > Aess su\> x£X (e ti{ - x >) then function g is positive (n a.s.) and is an eigen- 
function of Qt with eigenvalue rt. Unfortunately the equation (3.6) equation for 
some functions / can have no solution. To avoid this problem lets approximate 
function / by function with finite number of values. For all k £ Z + define as 

k . 

fk{x) = zHx 6 Ai, k ), 

i=l 

where Ai t k := {x e X : ^ < f(x) < ^}. We define [ik and Q t ,k by replacing 
/ instead fk in definitions of /i and Qt respectively. Operator Q is positive (i.e, 
if h > then Qh > 0) so operator Q t is positive to and 



Qt 



l 2 M 



< 



sup 

{h>0 : \\h\\ 

sup 

{h>0 : \\h\\ 



h, Q t h 
h,Qt,kh 



Q 



l.k 



L 2 M 



(3.7) 



By dominated convergence theorem limk_ i . 00 = /i. Let 9t{x) be the Perron- 
Frobenius eigenvalue of D\M X ^\. Function 6 t {x) is continuous so 6t,k '■— 0t{^k) 
converge to 9 t if k tends to infinity. First we show that (3.6) has solution for 
any function fk- We consider function 



F(r) = I 
Jx 



(1 - X)e tfk ^ 
r — Ae*^*( x ) 



7r(dx), 



for r > Aess sup x ^x{ et ^ k ^) this function is continuous. If r tends to infinity 
then F(r) tends to zero. Moreover exist 1 < j < k such that a := e'^ = 
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esssup a . g ^(e'^ fc ^- ) ) and 7r-measure of set C a := {x e X : a = e*-^ 3 ^} is equal 
to d > 0. So if r tends to Xa then F(r) tends to infinity. Hence exists r t .k 

such that F(r t: k) = 1 and gk{x) '■= - ^.'xJtw ^ s an eigcnfunction of Qt.k with 
eigenvalue r t .k 

The next Lemma completes the preparations for the proof of Theorem 3.3. 
3.8 Lemma. With the above notation for all k G Z + we have the following. 



(i) 
(ii) 



Q 



t,k 



lim ilogE^cxp [tf]AM j =log(r t , fe ) 

\ t=l / 



Proof of Theorem 3.3.. By Markov's inequality for alH > we obtain 

Ptt (S n >n(ji + e))< e- tn( -^E n e tSn . 
By Lemma 3.4 and (3.7) we have 



(3.9) 



< 



l 2 M 



Qt.k 



Let If, be a Markov chain with transition matrix M llkt \. From Theorem 3.5 
and Lemma 3.8 wc obtain 



log( 



it.k 



L 2 W 



= lim - log exp f i V f k (XI) ) 

n->oo n V f— ^ J 

< lim ilogE M exp(tV/(^ fe )) 
n-»oo n V y 



log(0t,fc), 



therefore 



E^e 45 " < 6 n 



/./,■■ 



We tend with k to infinity an obtain that 



E w e ts " < 0". 



By Proposition 2 of [LP04] wc complete the proof. 



□ 
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3.10 Corollary. With assumptions as in Theorem 3.3. For all e > such that 
p + e < 1 and for all measures v < < 7r we have: 



* v {S n >n{n + e)) < 



dv 




dn 


exp I 

p 



'g(i + A) 



e 2 n 



where 



dv 

(In 



fx\%\ P **V Vp<°° 



and 



ess sup^ |^0) | ifp = oo 
1 1 



1. 



Proof. From Holder's inequality we have 



Vx (S n -1 

X 


dv 




d-K 


p [fx 


dv 




d~K 


p [fx 



P x (S„_i > n(n + e) - 7r(dz) 

P. (SW-i > n(/i + e) - /(a:)) 7r(dai) 
P^(S„ > 7i(Ai + e))« . 



4. Spectral gap 



□ 



In this section we discuss Assumption 3.1 and more general existence of spectral 
gap in L\ for non reversible Markov chains. We start with definition of the 
spectral gap. 

4.1 Definition. We say that transition kernel admits spectral gap 1 — p iff 

sup{z : z e <t{P) \ {1}} = p < 1, 

where cr(-) denotes spectrum of operator in L 2 (ir) space (i.e. z S c(P) 
P — z\ is non-reversible operator in L 2 (ir) space.) 

By spectral radius formula the Definition 4.1 is equivalent to existence of no 
such that H-P — H|[i2(7r) < 1- Hence Assumption 3.1 implies that kernel P admits 
spectral gap, reverse if kernel P has the spectral gap p then exist no such that 
the no-step kernel P n ° satisfies 3.1. 
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In addition for reversible chains existence of spectral gap implies 3.1 for kernel 
P and is equivalent to geometric ergodicity (see for details [KM09, RR97]). So 
for reversible case Assumption 3.1 is well investigated and it is well known how 
to check this. 

We will focus on non-reversible case. Let P be the non-reversible transition 
kernel, we will construct a new kernel which is reversible and we will present 
Assumption3.1 and next spectral gap property in terms of this new kernel. For 
transition operator P let P* denotes an adjoint operator. It's easy to check that 
P* is also transition operator. Since state space X is a Polish space there exist 
a Markov chain (X*) n >i with transition kernel P*(x,A) := P*I(x G A). By 
definition of adjoint operators transition kernel satisfies 

f f P(x,dy)ir(dx) = f f P*{y,dx)n{dy), 

JAJB JAJB 

or shorter 

ir(dx)P(x, dy) = ir(dy)P* (y,dx). 

In general is often not possible or very difficult to find exact expressions for P* 
but in several cases it's easy. Below example show one of this situations. 
4-2 EXAMPLE. Lets consider deterministic scan Gibbs sampler on product 
space X d with transition kernel P — P1P2 ■ ■ ■ Pd- The i— th component p of 
the Gibbs sampler P replaces Xi by a draw from the conditional distribution 
ir(xi\xi, . . . , Xi—i, Xi+i, . . . , Xd)- Assume that ir and all conditional distribution 
have densities, then density of transition kernel Pj(x, y) = ir(yi\x\, . . . , Xj_i, x.;+i, . . . 
if yj = xj for j <E {1, i — 1, i + 1, d} and Pi{x, y) = otherwise. Hence for 
all x, y G X d and all 1 < i < d we have 

n(dx)Pi(x,dy) = ir(dy)P i {y 1 dx) 

and further Pj = P* . Thus P* = PdPd-i ■ ■ ■ P\, so P* is also Gibbs sampler but 
with reverse order of components. 

Next proposition gives us tools to checking Assumption 3.1 or existence of spec- 
tral gap. 

4.3 Proposition. For any transition kernel P , adjoint kernel P* and all n > 

we have 

||P"-n|| 2 L2W = i|P"p*"-n|| i2W . 

Proof. For any given n we have 

\\P n n||| 2(7r) = ||(P" - n)(p- - uy\\ LHv) . (4.4) 

Since tt is a stationary distribution for both kernels P and P* by definition of 
n we obtain 

{p n - n)(P n - n)* = p n p* n - p n n - np* n - n 2 = p n p* n - n (4.5) 

Applying (4.5) to (4.4) we finish the proof. □ 
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There are , necessarily non-reversible, geometrically ergodic chains that not 
admits spectral gap in L 2 (tt). See [KM09] for examples. Thus implies that these 
chains Assumption 3.1 is not satisfied for all n-step kernels P n . Furthermore 
the next example shows that even if the chain with kernel is uniformly ergodic 
then 3.1 could be not holden. 

4-6 EXAMPLE. Let (A„)„>i be the Markov chain on space {0, 1, 2} with tran- 
sition probabilities as follow 

p(o,i) = \, 

P(l,2) = 1, 
P(2,0) = 1, 

otherwise transition probabilities are equal zero. This chain is aperiodic and 
irreducible on finite state space, so is uniformly ergodic. It is easy to check that 
stationary distribution is 7r = [1/2, 1/4, 1/4]'. Transition probability from state 
2 to 1 for adjoint kernel satisfies tt(2)P*(2, 1) = tt(1)P(1,2) since tt(1) = tt(2) 
then P*(2, 1) = 1/2. Hence PP*(l, 1) = 1 and Markov chain with kernel PP*, 
is reversible and not ergodic. So by Proposition 1.2 from [KM09] kernel PP* no 
admits the spectral gap, and by reversibility we have that || PP* — n||i3/ ff j = 1. 
Proposition 4.3 implies that the Assumption 3.1 is not satisfied for Markov chain 
(X n ) n >i. 

Next proposition summarizes presented properties of spectral gap in terms of 
kernel PP*. 

4.7 Proposition. For any transition kernel P we have: 

a) P holds Assumption 3.1 Markov chain with transition kernel PP* 
is geometrically ergodic. 

b) P admits spectral gap exists n such that Markov chain with transi- 
tion kernel p n p n * is geometrically ergodic. 

Proof. First part is true by Proposition 4.3 and fact that for reversible chains 
spectral gap property, Assumption 3.1 and geometrically ergodicity are equiv- 
alent. Second part is obtained by the same arguments combined with the fact 
that chain admits spectral gap iff exists n that \\P n — H\\ 2 L2 ^<1. □ 

4.8 Corollary. If Markov chain with transition kernel P is uniformly ergodic. 
So P satisfies small set condition with X as a small set (i.e. exists h > 0,/3 > 
and measure v such that P h (x,A) > (3v(A)) then kernel P h holds Assump- 
tion 3.1. 
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To proof this corollary we need simple Lemma, proof of this Lemma is postponed 
to Section 5. 

4.9 Lemma. If for some set C with positive n-measure, (3 > and some 
probabilistic measure v we have: 

P(x, A) > pi(x G C)v{A) A G B(X),x G X, (4.10) 

then 

PP*(x, A) > ^/3 2 I(x G C)n(C nA) Ae B(X), xeX. 

Proof. From Lemma 4.9 we obtain that Markov chain with transition kernel 
php* j s uniformly ergodic and by Proposition 4.7 we complete the proof. □ 



We show that L 2 space theory can be used to non-reversible chains. We construct 
a new kernel PP* which is reversible and its easy to state spectral properties 
of P in terms PP* . In practice usually is hard to explore chains with kernel 
PP* , until we can not compute its density. State checkable condition only in 
terms of kernel P is crucial for the application of this theory, and needs further 
investigation. 



5. Proofs of lemmas 



Proof of Lemma 3.4- We know from Cauchy-Schwarz inequality that 



E w exp(i5„) = (l,(e t fP) n l) = (eV,P t n ^ 



< 



l 2 M 



Pt 



L 2 M 



(5.1) 
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For any function g denote its centered version by gc := g — n(g) and by Pq 
P — II. Since P satisfies 3.1 then by Cauchy-Schwartz inequality we obtain 



P 



l 2 M 



< 



< 



{g,h : 


sup 




llslLHN 




sup 




{g,h : 


llslU=l|fc| 






sup 




{g.h : 


\\9h=\\h\ 






sup 




{g,h : 




. = 1} 




sup 




{g.h : 


\\9h = \\h\ 





h,P t g) 

^Tr(e%1 : h)n{e^^ g) 

+ {(eyh) c ,Po(e% f g)c)} 
^ir(e^ f h)ir(e^ g) 

(e^ f h) c {e^ f g) c } 

7T 7T J 



+X 



Tr(eif g) + A (e^ g) 



7r(e3/ft) +A (e^/i) 



< sup 

{s ^ llslL=i} 



ir(ezfg) 



+ A 



(e* f g) c 



sup ( g,Tr(ezf g) + X(e 2 ^g)c) 
{9 ■■ \\9L=l} 



Qi 

Furthermore we have 



i 2 W 



>(e^,Q t e^\ = < e ^,ge^)=^(e^) 2 

L 2 (7r) \ / 

+ Att ((ef /) 2 C ) >^(e 4/ ) S 

and that completes the proof. 



□ 



Proof of Lemma 3.8. Ad. (i) Suppose that Qt k > r* t since is self- 

L 2 (7r) 

adjoint operator and is an eigenfunction of this operator then exists a se- 
quence of function h n such that following conditions are hold: 

1. for all n = 1, 2, ... ||/i n |L = 1 and 



h n ±gk 



(5.2) 
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1,2,..., 



Qt,k 



< ( h n , Qt,kh n 



Function gu is positive and 



1 



C\ = < gk < Aess sup gk(x) = AC2 7r a.s. 

Tt.k x£X 



(5.3) 



(5.4) 



From (5.2) it follows that functions h£ = max(ft,„,0) and h n = max(— h n ,0) 
satisfy 

AC 2 



and 

For all functions /i 
/in, Qt.kh 



nh n < ^nh+ = C 3 Trh+. 



(5.5) 



(h n ,Q t ,kh n ) = (1 - A) [^(e'^hn)] + A7r(e tA /£) 



(1-A) 



?r(e5 /fc /i+)) 2 - 27r(e^ /fc /i+)7r(e^ /fe ^) 



n(e* fk h n ) 



\ir(e tfk h 2 n ){5.6) 



Operator Q tJ t is positive so for all functions ft, we have , Q^/. ) > ( h, Qt,kh 
Functions h£, h~ are positive so from (5.3) 

- > (\h n \ ,Q t ,k \h„\^ - (h n ,Qt,kh r 

= 4(1 - \)ir{eJ fk h+)ir(e? fk h-) 
> 4(l-X)7r(h+)n(h-) 



(5.7) 



Finally from (5.5), (5.6), (5.7) and the definition of r t .k we obtain 



h n ,Q t> kh n ) < (1-A) 



< e t (l-\)[(ir(h+)) 2 + (n(h-)) 2 ]+r t , k 



< 



2n 



Tending to infinity with n we obtain 



Q 



l.k 



L 2 M 



< n,k- 
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St.k 



L 2 M 



r t ,k 



9k,Qt,k X 9k 

hkWl 



< 



< 



\\9k\ 



\\9k\ 



It.k 



L 2 M 



cr 1 



lt,k 



L 2 M 

but E w exp(tY,tih( X l)) = (eif^Ql^eif*). 



□ 



Proof of Lemma 4-9. Without loss of generality we can assume A n C ^ 0. Let 
ir G C, then: 

PP*(x,4) = J P*{z,A)P{x,dz) > (3 J P*(z,AnC)v(dz) (5.8) 

By integrating (4.10) we obtain that v with respect to n, let ^ its density. 
Define set D(e) := {x e X : %>e). Since J x ^ir{dx) = 1 and £ > 0„ so 
for all < e < 1 we have 7r(_D(e)) > 0. From (5.8) we obtain: 

PP*{ Xl A)>p( f eP*(z,dy)ir(dz) = l3e [ f 7r(dy)P(y,dz). 

JD(e) JAnC JD(e) J AnC 

(5.9) 

Hence 

PP*{x,A) > /3 2 eis(D{e))ir(AnC). (5.10) 
To finish proof we needs to bound v(D(e)) 

dv 



D{s)C 



c/tt 



(x)n(dx) < e I n(dx) = e, 



x 



so v(D(e)) > 1 — e. Optimize (5.10) with respect to e yields 

PP*(x,A) > ^/3 2 ir(AnC). 



(5.11) 
□ 



imsart-generic ver. 2011/11/15 file: hooefdnonreversible4.tex date: January 12, 2012 



B. Miasojedow/Hoeffding's inequalities for Markov chains 



13 



Acknowledgements 

Author thanks Witold Bednorz, Krzysztof Latuszyhski and Wojciech Niemiro 
for helpful comments. 



References 

[GO02] P.W. Glynn and D. Ormoneit. Hoeffding's inequality for uniformly 
ergodic Markov chains. Statistics & probability letters, 56(2):143- 
146, 2002. 

[KLMM05] I. Kontoyiannis, L.A. Lastras-Montano, and S.P. Mcyn. Relative en- 
tropy and exponential deviation bounds for general Markov chains. 
In IEEE, International Symposium on Information Theory, pages 
1563-1567. IEEE, 2005. 

[KM09] I. Kontoyiannis and S.P. Meyn. Geometric Ergodicity and the 
Spectral Gap of Non-Reversible Markov Chains. Arxiv preprint 
arXiv:0906.5322, 2009. 

[Lez98] Pascal Lezaud. Chernoff-type bound for finite markov chains. An- 
nals of Applied Probability, Vol. 8 (1998), no. 3, pp. 849-867, 1998. 

[LP04] C.A. Leon and F. Perron. Optimal Hocffding bounds for discrete 
reversible Markov chains. Annals of Applied Probability, pages 958- 
970, 2004. 

[RR97] G.O. Roberts and J.S. Rosenthal. Geometric ergodicity and hybrid 
Markov chains. Electron. Comm. Probab, 2(2):13-25, 1997. 



imsart-generic ver. 2011/11/15 file: hooefdnonreversible4.tex date: January 12, 2012 



